NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

How far can camels go? exploring the state of instruction tuning on open resources

Wang, Yizhong; Ivison, Hamish; Dasigi, Pradeep; Hessel, Jack; Khot, Tushar; Chandu, Khyathi; Wadden, David; MacMillan, Kelsey; Smith, Noah; Beltagy, Iz; et al (May 2024, Neurips)

Full Text Available
Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents

https://doi.org/10.18653/v1/2020.emnlp-main.160

Yauney, Gregory; Hessel, Jack; Mimno, David (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))

Images can give us insights into the contextual meanings of words, but current image-text grounding approaches require detailed annotations. Such granular annotation is rare, expensive, and unavailable in most domain-specific contexts. In contrast, unlabeled multi-image, multi-sentence documents are abundant. Can lexical grounding be learned from such documents, even though they have significant lexical and visual overlap? Working with a case study dataset of real estate listings, we demonstrate the challenge of distinguishing highly correlated grounded terms, such as “kitchen” and “bedroom”, and introduce metrics to assess this document similarity. We present a simple unsupervised clustering-based method that increases precision and recall beyond object detection and image tagging baselines when evaluated on labeled subsets of the dataset. The proposed method is particularly effective for local contextual meanings of a word, for example associating “granite” with countertops in the real estate dataset and with rocky landscapes in a Wikipedia dataset.
more » « less
Full Text Available
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents

https://doi.org/10.18653/v1/D19-1210

Hessel, Jack; Lee, Lillian; Mimno, David (November 2019, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))
null (Ed.)
Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present. We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training. We experiment on seven datasets of varying difficulty, ranging from documents consisting of groups of images captioned post hoc by crowdworkers to naturally-occurring user-generated multimodal documents. We find that a structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time.
more » « less
Full Text Available
Something’s Brewing! Early Prediction of Controversy-causing Posts from Discussion Features

https://doi.org/10.18653/v1/N19-1166

Hessel, Jack; Lee, Lillian (January 2019, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Controversial posts are those that split the preferences of a community, receiving both significant positive and significant negative feedback. Our inclusion of the word “community” here is deliberate: what is controversial to some audiences may not be so to others. Using data from several different communities on reddit.com, we predict the ultimate controversiality of posts, leveraging features drawn from both the textual content and the tree structure of the early comments that initiate the discussion. We find that even when only a handful of comments are available, e.g., the first 5 comments made within 15 minutes of the original post, discussion features often add predictive capacity to strong content-andrate only baselines. Additional experiments on domain transfer suggest that conversations tructure features often generalize to other communities better than conversation-content features do.
more » « less
Full Text Available
Quantifying the Visual Concreteness of Words and Topics in Multimodal Datasets

https://doi.org/10.18653/v1/N18-1199

Hessel, Jack; Mimno, David; Lee, Lillian (June 2018, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.
more » « less
Full Text Available
Quantifying the visual concreteness of words and topics in multimodal datasets

Hessel, Jack; Lee, Lillian; Mimno, David (January 2018, North American Association for Computational Linguistics)

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multi-modal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multi- modal research.
more » « less
Full Text Available

Search for: All records